Ian Campbell [Wed, 20 Apr 2011 16:13:06 +0000 (17:13 +0100)]
tools: support SeaBIOS. Use by default when upstream qemu is configured.
The SeaBIOS integration here is only semi-complete and is targetted at
developers and very early adopters who can be expected to cope with
some rough edges. In particular the user must clone, patch as
necessary and compile SeaBIOS themselves since this patchset does not
cover any of that (in the same way we currently do not integrate
upstream qemu clone+build). Include a big comment to that effect next
to the Config.mk option.
Many of the bios_config callback functions are not yet used by
SeaBIOS.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
Ian Campbell [Wed, 20 Apr 2011 16:13:06 +0000 (17:13 +0100)]
tools: libxl: write selected BIOS to xenstore.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
Ian Campbell [Wed, 20 Apr 2011 16:13:00 +0000 (17:13 +0100)]
tools: libxl: hide selection of device-model by default.
This should never have been exposed to users as something they are
required to think about, unless they want to.
At the libxl API level:
* Add libxl_device_model_info.device_model_version allowing the
user to say which qemu version (e.g. old qemu-xen or qemu
upstream) they want for a domain.
* Add libxl_device_model_info.device_model_stubdomain allowing
the user to select stub or non-stub device model
* Default both the device_model field to NULL and DTRT when
building a domain with that value in those fields, but still
allow libxl users to specify something explicit if they want.
* Note that libxl_device_model_info.device_model, if specified,
must now be a complete path.
At the xl level:
* Support a new "device_model_version" option which sets the new
libxl_device_model_info.device_model_version field. This
option is mandatory if device_model_override is used.
* Support a new "device_model_stubdomain_override" option which
allows the user to request stubdomain if desired.
* WARN if an HVM guest cfg uses the "device_model" config
option, and direct users to the "device_model_override" option
if they really do not want the default. If the "device_model"
directive contains "stubdom-db" then direct users to the
"device_model_stubdomain_override" directive.
The default qemu remains the existing qemu-xen based qemu-dm and
stubdomain defaults to off. I chose the name "qemu-xen traditional" to
refer to the existing Xen fork of qemu and simply "qemu-xen" to refer to
the new device model based on qemu upstream.
I suspect that the vast majority of users only have these config
options because they've copied them from somewhere and they normally
have no interest in which device model is used. Renaming the fields
and warning when they are used makes these decisions internal. This
will allow us to make decisions at a platform level regarding the
preferred hvmloader, device model, stub domain etc without requiring
everyone to change their configuration files.
Adding a device model version to the API is intended to make it easy
for users to select what they need without having to know about the
paths to specific binaries etc. Most importantly it gets rid of the
parsing of the output of qemu -h...
It's not clear where upstream qemu will eventually be installed, I
went with /usr/bin/qemu for now.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
Ian Campbell [Tue, 19 Apr 2011 08:16:24 +0000 (09:16 +0100)]
tools: libxl: hide selection of hvmloader by default.
This should never have been exposed to users as something they are
required to think about, unless they want to.
At the libxl API level:
* Move libxl_domain_build_info.kernel into the PV side of the
tagged union (using this field to specify both PV kernel and
hvmloader is confusing)
* Add hvmloader (a string) to the HVM side of the tagged union.
This defaults to NULL and libxl will DTRT with that default
but still allow libxl users to specify something explicit if
they want.
At the xl level:
* WARN if an HVM guest cfg uses the "kernel" config option, and
direct users to the "hvmloader_override" option if they really
do not want the default hvmloader.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
George Dunlap [Wed, 27 Apr 2011 12:36:15 +0000 (13:36 +0100)]
credit2: add a callback to migrate to a new cpu
In credit2, there needs to be a strong correlation between
v->processor and the runqueue to which a vcpu is assigned;
much of the code relies on this invariant. Allow credit2
to manage the actual migration itself.
This fixes the most recent credit2 bug reported on the list
(Xen BUG at sched_credit2.c:1606) in Xen 4.1, as well as
the bug at sched_credit2.c:811 in -unstable (which catches the
same condition earlier).
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Daniel Kiper [Wed, 27 Apr 2011 12:29:14 +0000 (13:29 +0100)]
pv-grub: Fix for incorrect dom->p2m_host[] list initialization
Introduction of Linux Kernel git commit
ceefccc93932b920a8ec6f35f596db05202a12fe (x86: default
CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB) revealed
deeply hidden bug in pv-grub. During kernel load stage dom->p2m_host[]
list has been incorrectly initialized.
At the beginning of kernel load stage dom->p2m_host[] list is
populated with current PFN->MFN layout. Later during memory allocation
(memory is allocated page by page in kexec_allocate()) page order is
changed to establish linear layout in new domain. It is done by
exchanging subsequent MFNs with newly allocated MFNs. dom->p2m_host[]
list is indexed by currently requested PFN (it is incremented from 0)
and PFN of newly allocated paged. If PFN of newly allocated page is
less than currently requested PFN then earlier allocated MFN is
overwritten which leads to domain crash later. This patch corrects
that issue. If PFN of newly allocated page is less then currently
requested PFN then relevant PFN/MFN pair is properly calculated and
usual exchange occurs later.
Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Keir Fraser [Mon, 25 Apr 2011 14:27:56 +0000 (15:27 +0100)]
libxc: Fill in XSAVE-related CPUID leaves for PV guests.
Signed-off-by: Keir Fraser <keir@xen.org>
Tim Deegan [Mon, 25 Apr 2011 12:17:05 +0000 (13:17 +0100)]
vtd: check and print EPT compatibility once, at boot.
Merge the check for EPT/VT-D pagetable compatibility into the other
VT-D boot-time checks. Previously it was checking and printing many
times on each VM boot.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Tim Deegan [Wed, 20 Apr 2011 11:02:51 +0000 (12:02 +0100)]
xen/x86: re-enable xsave by default now that it supports live migration.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Keir Fraser [Mon, 18 Apr 2011 17:34:45 +0000 (18:34 +0100)]
tools: hvmloader: attempt to SHUTDOWN_crash on BUG
Executing UD2 (invalid opcode) triggers a triple fault which signals
reboot to the toolstack, rather than crash.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Keir Fraser <keir@xen.org>
Keir Fraser [Mon, 18 Apr 2011 17:08:47 +0000 (18:08 +0100)]
hvmloader: Fix _start-relative calculation of hypercall page address.
We got away with it because _start-HYPERCALL_PHYSICAL_ADDRESS happens
to equal HYPERCALL_PHYSICAL_ADDRESS.
Signed-off-by: Keir Fraser <keir@xen.org>
Wei Wang [Mon, 18 Apr 2011 16:24:21 +0000 (17:24 +0100)]
x86/mm: Add a generic interface for vtd and amd iommu p2m sharing.
Also introduce a new parameter (iommu=sharept) to enable this feature.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
Wei Wang [Mon, 18 Apr 2011 16:24:21 +0000 (17:24 +0100)]
x86/mm: Implement p2m table sharing for AMD IOMMU.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
Wei Wang [Mon, 18 Apr 2011 16:24:21 +0000 (17:24 +0100)]
x86/mm: add AMD IOMMU control bits to p2m entries.
This patch adds next levels bit into bit 9 - bit 11 of p2m entries and
adds r/w permission bits into bit 61- bit 62 of p2m entries.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
Wei Wang [Mon, 18 Apr 2011 16:24:21 +0000 (17:24 +0100)]
x86/mm: Move p2m type into bits of the PTE that the IOMMU doesn't use.
AMD IOMMU hardware uses bit 9 - bit 11 to encode lower page levels. p2m
type bits in p2m flags has to be shifted from bit 9 to bit 12. Also,
bit 52 to bit 60 cannot be non-zero for iommu pde. So, the definition of
p2m_ram_rw has to be swapped with p2m_invalid.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
Olaf Hering [Mon, 18 Apr 2011 14:12:04 +0000 (15:12 +0100)]
xentrace: correct overflow check for number of per-cpu trace pages
The calculated number of per-cpu trace pages is stored in t_info and
shared with tools like xentrace. Since its an u16 the value may
overflow because the current check is based on u32. Using the u16
means each cpu could in theory use up to 256MB as trace
buffer. However such a large allocation will currently fail on x86 due
to the MAX_ORDER limit. Check both max theoretical number of pages
per cpu and max number of pages reachable by struct t_buf->prod/cons
variables with requested number of pages.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Mon, 18 Apr 2011 12:36:10 +0000 (13:36 +0100)]
svm: implement instruction fetch part of DecodeAssist (on #PF/#NPF)
Newer SVM implementations (Bulldozer) copy up to 15 bytes from the
instruction stream into the VMCB when a #PF or #NPF exception is
intercepted. This patch makes use of this information if available.
This saves us from a) traversing the guest's page tables, b) mapping
the guest's memory and c) copy the instructions from there into the
hypervisor's address space.
This speeds up #NPF intercepts quite a lot and avoids cache and TLB
trashing.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Keir Fraser <keir@xen.org>
Keir Fraser [Mon, 18 Apr 2011 09:10:02 +0000 (10:10 +0100)]
svm: decode-assists feature must depend on nextrip feature.
...since the decode-assist fast paths assume nextrip vmcb field is
valid.
Signed-off-by: Keir Fraser <keir@xen.org>
Andre Przywara [Mon, 18 Apr 2011 09:06:37 +0000 (10:06 +0100)]
svm: implement INVLPG part of DecodeAssist
Newer SVM implementations (Bulldozer) give the desired address on
a INVLPG intercept explicitly in the EXITINFO1 field of the VMCB.
Use this address to avoid a costly instruction fetch and decode
cycle.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Andre Przywara [Mon, 18 Apr 2011 09:01:06 +0000 (10:01 +0100)]
svm: implement CR access part of DecodeAssist
Newer SVM implementations (Bulldozer) now give the used general
purpose register on a MOV-CR intercept explictly. This avoids
fetching and decoding the instruction from guest's memory and speeds
up some Windows guest, which exercise CR8 quite often.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Keir Fraser <keir@xen.org>
Andre Przywara [Mon, 18 Apr 2011 08:49:13 +0000 (09:49 +0100)]
svm: add bit definitions for SVM DecodeAssist
Chapter 15.33 of recent APM Vol.2 manuals describe some additions
to SVM called DecodeAssist. Add the newly added fields to the VMCB
structure and name the associated CPUID bit.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Keir Fraser [Mon, 18 Apr 2011 08:47:12 +0000 (09:47 +0100)]
vmx/hvm: move mov-cr handling functions to generic HVM code
Currently the handling of CR accesses intercepts is done much
differently in SVM and VMX. For future usage move the VMX part
into the generic HVM path and use the exported functions.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Keir Fraser <keir@xen.org>
Keir Fraser [Mon, 18 Apr 2011 04:01:19 +0000 (05:01 +0100)]
hvmloader: Fix build dependency (rombios.o depends on roms.inc)
Also, generate roms.inc file in a scratch location and then move in
place. This is more reliable if make is terminated at an arbitrary
point.
Signed-off-by: Keir Fraser <keir@xen.org>
Keir Fraser [Sat, 16 Apr 2011 19:12:05 +0000 (20:12 +0100)]
x86_32: Fix the build.
Signed-off-by: Keir Fraser <keir@xen.org>
Christoph Egger [Fri, 15 Apr 2011 17:54:57 +0000 (18:54 +0100)]
nestedhvm: Flush L2 guest ASID across guest nestedhvm disable/enable.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Keir Fraser [Fri, 15 Apr 2011 09:07:42 +0000 (10:07 +0100)]
nestedhvm: Allocate a separate host ASID for each L2 VCPU.
This avoids TLB flushing on every L1/L2 transition.
Signed-off-by: Keir Fraser <keir@xen.org>
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Keir Fraser [Fri, 15 Apr 2011 07:52:08 +0000 (08:52 +0100)]
x86: don't write_tsc() non-zero values on CPUs updating only the lower 32 bits
This means suppressing the uses in time_calibration_tsc_rendezvous(),
cstate_restore_tsc(), and synchronize_tsc_slave(), and fixes a boot
hang of Linux Dom0 when loading processor.ko on such systems that
have support for C states above C1.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Keir Fraser <keir@xen.org>
Keir Fraser [Thu, 14 Apr 2011 13:57:24 +0000 (14:57 +0100)]
Tracing facility for nested virtualization
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Keir Fraser <keir@xen.org>
Keir Fraser [Thu, 14 Apr 2011 13:54:24 +0000 (14:54 +0100)]
hvm, xentrace: Extend HVMTRACE_ND so that a modifier to the basic
event reason can be ORed into the trace record.
Signed-off-by: Keir Fraser <keir@xen.org>
Keir Fraser [Wed, 13 Apr 2011 15:10:26 +0000 (16:10 +0100)]
x86: make the pv-only e820 array be dynamic.
During creation of the PV domain we allocate the E820 structure to
have the amount of E820 entries on the machine, plus the number three.
This will allow the tool stack to fill the E820 with more than three
entries. Specifically the use cases is , where the toolstack retrieves
the E820, sanitizes it, and then sets it for the PV guest (for PCI
passthrough), this dynamic number of E820 is just right.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Keir Fraser <keir@xen.org>
Christoph Egger [Wed, 13 Apr 2011 13:14:59 +0000 (14:14 +0100)]
x86/svm/asid: Introduce svm_invlpga()
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Christoph Egger [Wed, 13 Apr 2011 13:14:32 +0000 (14:14 +0100)]
x86/hvm/asid: Use C99 integer types for asid numbers
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Christoph Egger [Wed, 13 Apr 2011 13:09:55 +0000 (14:09 +0100)]
tools/xentrace: decode 'continue_running' and 'RDTSC' entries.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Tim Deegan [Wed, 13 Apr 2011 08:18:10 +0000 (09:18 +0100)]
Remove "uninitialized_var" macro, which doesn't work with clang.
Since its only user is in ACPI parsing code, the extra overhead of
initializing to 0 is not worth fighting over.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Stephen Smalley [Tue, 12 Apr 2011 13:55:25 +0000 (14:55 +0100)]
xsm: Fix xsm_mmu_* and xsm_update_va_mapping hooks
This is an attempt to properly fix the hypervisor crash previously
described in
http://marc.info/?l=xen-devel&m=
128396289707362&w=2
In looking into this issue, I think the proper fix is to move the
xsm_mmu_* and xsm_update_va_mapping hook calls later in the callers,
after more validation has been performed and the page_info struct is
readily available, and pass the page_info to the hooks. This patch
moves the xsm_mmu_normal_update, xsm_mmu_machphys_update and
xsm_update_va_mapping hook calls accordingly, and updates their
interfaces and hook function implementations. This appears to resolve
the crashes for me.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Ian Campbell [Tue, 12 Apr 2011 13:00:49 +0000 (14:00 +0100)]
tools: hvmloader: select BIOS through xenstore.
Allow the toolstack to select the BIOS to use via a xenstore key.
Defaults to "rombios" for compatibility with toolstacks which do not
write the key (e.g. xend).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:47:16 +0000 (13:47 +0100)]
tools: hvmloader: Refactor MP table setup into struct bios_config
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:46:49 +0000 (13:46 +0100)]
tools: hvmloader: Refactor ACPI table setup into struct bios_config
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:46:20 +0000 (13:46 +0100)]
tools: hvmloader: Refactor VM86 and E820 setup into struct bios_config
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Tue, 12 Apr 2011 12:45:59 +0000 (13:45 +0100)]
tools: hvmloader: refactor highbios and bios_info setup into struct bios_config
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:45:31 +0000 (13:45 +0100)]
tools: hvmloader: Refactor APIC, PCI and SMP setup into struct bios_config
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:44:38 +0000 (13:44 +0100)]
tools: hvmloader: add bios_config data structure
For now abstract away the actual ROM bits themselves and the various
load addresses.
Create a rombios.c to contain the ROMBIOS specific parts. ROMBIOS is
still statically selected for the time being.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:43:45 +0000 (13:43 +0100)]
tools: hvmloader: Define $(OBJS) directly instead of via $(SRCS)
$(SRCS) isn't used for anything else.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:41:43 +0000 (13:41 +0100)]
tools: hvmloader: rename roms.h to roms.inc
It's not really a header, it's an autogenerated data file.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:40:49 +0000 (13:40 +0100)]
tools: hvmloader: remove rombios_sz, just use sizeof(rombios)
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:39:56 +0000 (13:39 +0100)]
tools: hvmloader: refactor Makefile to move ROM filenames into variables.
Add an option to use debug Cirrus video BIOS, simply as a convenience.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:39:22 +0000 (13:39 +0100)]
tools: hvmloader: split scratch and hypercall addressing from ROMBIOS low heap.
Although happen to live at the same physical address their lifespans
do not overlap. The scratch and hypercall spaces are used only within
hvmloader and the same area is reused as a heap within ROMBIOS. But
each is free to make its own decisions about where to place things.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:38:38 +0000 (13:38 +0100)]
tools: hvmloader: pass option ROM end address around as a parameter.
Reduces the cross talk between ROMBIOS and hvmloader.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:38:01 +0000 (13:38 +0100)]
tools: hvmloader: pass SMBIOS location as a runtime parameter.
Instead of hardcoding in a header.
Reduces the cross talk between ROMBIOS and hvmloader.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:37:03 +0000 (13:37 +0100)]
tools: hvmloader: pass ACPI_PHYSICAL_ADDRESS as a runtime parameter.
Instead of hardcoding in a header.
Reduces the cross talk between ROMBIOS and hvmloader.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:36:17 +0000 (13:36 +0100)]
tools: hvmloader: split e820 support into its own code module.
Pass the table address as a paramter to the build function and cause
it to return the number of entries. Pass both base and offset as
parameters to the dump function.
This adds a duplicated e820.h header to ROMBIOS. Since the e820 data
structure is well defined by existing BIOS implementations I think
this is OK and simplifies the cross talk between hvmloader and
ROMBIOS.
Reduces the cross talk between ROMBIOS and hvmloader.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Ian Campbell [Tue, 12 Apr 2011 12:34:30 +0000 (13:34 +0100)]
tools: hvmloader: move ROMBIOS configuration into tools/firmware/rombios/
Currently rombios and hvmloader are rather intertwined. Separate the
ROMBIOS configuration options out into a ROMBIOS provided file so that
the dependency can become strictly from hvmloader to rombios.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Shriram Rajagopalan [Tue, 12 Apr 2011 12:28:51 +0000 (13:28 +0100)]
remus: fix incorrect error handling for switch_qemu_logdirty in checkpoint code
c/s 22275: "tools: cleanup domain save switch_qemu_logdirty callback"
introduced a whole bunch of error code fixups. In the process, it also
ended up treating the success return code (0) from
switch_qemu_logdirty as an error and vice versa.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Jan Beulich [Tue, 12 Apr 2011 12:27:27 +0000 (13:27 +0100)]
passthrough: prevent non-HVM access to HVM-only data
Spotted this oversight in c/s 23144:
37c4f7d492a4.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Wei Wang [Tue, 12 Apr 2011 12:26:19 +0000 (13:26 +0100)]
AMD IOMMU: Fix an interrupt remapping issue
Some device could generate bogus interrupts if an IO-APIC RTE and an
iommu interrupt remapping entry are not consistent during 2 adjacent
64bits IO-APIC RTE updates. For example, if the 2nd operation updates
destination bits in RTE for SATA device and unmask it, in some case,
SATA device will assert ioapic pin to generate interrupt immediately
using new destination but iommu could still translate it into the old
destination, then dom0 would be confused. To fix that, we sync up
interrupt remapping entry with IO-APIC IRE on every 32 bits operation
and forward IOAPIC RTE updates after interrupt.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
Wei Wang [Tue, 12 Apr 2011 12:20:57 +0000 (13:20 +0100)]
amd iommu: Unmapped interrupt should generate IO page faults.
This helps us to debug interrupt issues.
Signed-off-by: Wei Wang <wei.wang2@amd.com>
Keir Fraser [Sat, 9 Apr 2011 11:42:24 +0000 (12:42 +0100)]
nestedhvm: Remove nhvm_{initialise,destroy,reset}.
They are a pointless level of abstraction beneath nestedhvm_* variants
of the same operations, which all callers should be using.
At the same time, nestedhvm_vcpu_initialise() does not need to call
destroy if initialisation fails. That is the vendor-specific init
function's job (clearing up its own state on failure).
Signed-off-by: Keir Fraser <keir@xen.org>
Jim Fehlig [Fri, 8 Apr 2011 15:56:08 +0000 (16:56 +0100)]
libxl: Add libvirt-xml to userdata userid registry
Signed-off-by: Jim Fehlig <jfehlig@novell.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Shriram Rajagopalan [Fri, 8 Apr 2011 15:49:25 +0000 (16:49 +0100)]
remus: blackhole replication target
The new --null option allows one to test and play with just the
memory checkpointing and network buffering aspect of remus, without
the need for a second host. The disk is not replicated. All replication
data is sent to /dev/null. This option is pretty handy when a user
wants to see the page churn for his workload or observe the latency hit
though the latter will not be accurate.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Shriram Rajagopalan [Fri, 8 Apr 2011 15:49:04 +0000 (16:49 +0100)]
remus: proper cleanup on checkpoint failure.
While running remus, when an error occurs during checkpointing
(e.g., timeouts on primary, failing to checkpoint network buffer
or disk or even communication failure) the domU is sometimes
left in suspended state on primary. Instead of blindly closing
the checkpoint file handle, attempt to resume the domain before
the close.
Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 8 Apr 2011 15:40:58 +0000 (16:40 +0100)]
libxl: Drop internal DEVICE_TAP backend type
There is no such thing with blktap2, the backend in that case is PHY.
libxl_device_disk_del was just plain wrong in this regard, fix it to
select the appropriate backend_kind.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 8 Apr 2011 15:40:19 +0000 (16:40 +0100)]
libxl: handle the tail end of a tap device using the phy backend handling code
We are literally creating a phy backend on top of a blktap2 created
device anyway so we might as well reuse the code and make this
explicit.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 8 Apr 2011 15:39:53 +0000 (16:39 +0100)]
libxl: refactor DISK_BACKEND_PHY handling in libxl_device_disk_add
A step on the path to sharing this code with the tail-end of the
DISK_BACKEND_TAP case.
I made the result of libxl__blktap_devpath non-const to achieve
this. The existing caller calls libxl__strdup on the result but since
the function is an internal one and the result is already garbage
collected I think this is unnecessary and we can just use the
non-const result directly.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 8 Apr 2011 15:39:19 +0000 (16:39 +0100)]
libxl: only a CDROM type disk can be empty.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 8 Apr 2011 15:38:59 +0000 (16:38 +0100)]
libxl: convert an empty tap disk into a qdisk
I'm not sure that empty disks which are is_cdrom are especially valid,
or that a cdrom can ever be handled by tapdisk anyway but try to do
something sane since it seems that xl's parse_disk_config() routine
could potentially generate such a configuration (although whether from
a valid input string or not I'm not sure).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 8 Apr 2011 15:38:36 +0000 (16:38 +0100)]
libxl: make fallback from blktap2 to qdisk more explicit.
When blktap2 is not present we fallback to qdisk, instead of falling
through a switch statement instead make this explicit, with a comment,
prior to the switch statement.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 8 Apr 2011 15:38:06 +0000 (16:38 +0100)]
libxl: remove impossible check for backend != DISK_BACKEND_QDISK
In this case we are already in the DISK_BACKEND_QDISK case of a switch
statement on the same variable.
It is possible that we fell through from the DISK_BACKEND_TAP case
(although I'm about to remove that in a subsequent patch), however in
that case we are explicitly falling back from blktap2 to qdisk so
DEVICE_QDISK is still the right answer.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 8 Apr 2011 15:36:20 +0000 (16:36 +0100)]
libxl: drop domid field from libxl_device_*
All functions which add a device to a domain already take a domid
argument and the callers typically write the same value to the
structure right before making the call.
Functions which delete a device typically do not but adding this field
makes the interface more consistent anyway and all callers have the
domid to hand.
All functions which return a libxl device structure are given a domid
as a paramter and the caller therefore already knows which domain it
is dealing with.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 8 Apr 2011 15:22:51 +0000 (16:22 +0100)]
xend: drop XenAPI error message translation
The only "translation" is to the C locale (e.g. the NUL
translation). I think it very unlikely we are going to see any new
translations of the XenAPI error messages at this point so the only
purpose of this code appears to be to periodically regenerate
xen-xm.pot with a new embedded timestamp, to the detriment of those of
us who use a version control system.
After much beating with sticks I mananged to enable XenAPI support in
xend and configure xm such that it returns "Permission denied." (AKA
the SESSION_AUTHENTICATION_FAILED message) which I take to be a sign
I've not broken things too badly.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 8 Apr 2011 15:21:12 +0000 (16:21 +0100)]
libxl/xl: drop support for netchannel2
netchannel2 was never widely deployed and no supported kernel includes
either the front- or back-ends. The last known kernel with this
support was the xen.git 2.6.31 branch which has been unsupported for
ages.
xl will warn the user if it spots a "vif2" configration item but
otherwise support is completely removed.
Work is ongoing to add the interesting features of netchannel2 as
protocol extensions to netchannel1.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Fri, 8 Apr 2011 15:17:18 +0000 (16:17 +0100)]
libxl: bump SONAME after binary incompatible change.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tim Deegan [Thu, 7 Apr 2011 14:08:32 +0000 (15:08 +0100)]
xen/lto: if the makefile asks for binary, always build binary
even if the source is a C file.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Tim Deegan [Thu, 7 Apr 2011 14:08:05 +0000 (15:08 +0100)]
xen/x86: explicitly mark start-of-file asm()s as .text
LLVM and gold between them get confused when asm align commands
are emitted before a section marker.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Tim Deegan [Thu, 7 Apr 2011 14:06:47 +0000 (15:06 +0100)]
xen/acpi: disentangle ACPI enumerations.
There are two sets of ACPI table enums and structs, and clang
complains about implicit casts between them. It would be much better
to remove one entire set of ACPI definitions but for now just use the
right enum for each interface.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Tim Deegan [Thu, 7 Apr 2011 14:06:06 +0000 (15:06 +0100)]
xen: another unsigned comparison < 0
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Gianni Tedesco [Thu, 7 Apr 2011 11:13:58 +0000 (12:13 +0100)]
libxc: set all VCPU's online by default in HVM info table
This sets a saner default for the cpu-online-map by setting all bits
to 1. The default assumption ought to be that nr-vcpus ==
nr-vcpus-at-start. If that is not true, then the toolstack must modify
the bitmap, but if it is true, the toolstack oughtn't need to do
anything further.
Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com>
Liu, Jinsong [Thu, 7 Apr 2011 11:12:38 +0000 (12:12 +0100)]
X86: offline/broken page handler for pod cache
When offline a page, or, when a broken page occur, the page maybe
populated, or, may at pod cache. This patch is to handle the
offline/broken page at pod cache. It scan pod cache, if hit, remove
and replace it, and then put the offline/broken page to
page_offlined_list/page_broken_list
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Liu, Jinsong [Thu, 7 Apr 2011 11:12:01 +0000 (12:12 +0100)]
X86: Fix mce offline page bug
c/s 19913 break mce offline page logic:
For page_state_is(pg, free), it's impossible to trigger the case;
For page_state_is(pg, offlined), it in fact didn't offline related
page;
This patch fix the bug, and remove an ambiguous comment.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Tim Deegan [Thu, 7 Apr 2011 10:39:35 +0000 (11:39 +0100)]
x86/hvm: do actually init nested HVM state for VCPUs
when nested HVM is enabled after VCPus are allocated.
The previous patch would fail because the call to
nestedhvm_vcpu_initialise() in the HVM param set code
happens before nestedhvm_enabled(v->domain) is true.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Tim Deegan [Thu, 7 Apr 2011 10:12:55 +0000 (11:12 +0100)]
x86/hvm: Don't unconditionally set up nested HVM state
for domains that aren't going to use it.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Ian Campbell [Wed, 6 Apr 2011 15:50:16 +0000 (16:50 +0100)]
libxl: do not expose libxenctrl/libxenstore headers via libxl.h
This completely removes libxenstore from libxl users' view.
xl still needs libxenctrl directly due to the direct use of the
xentoollog functionality but it is not exposed to the indirect linkage
anymore.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Wed, 6 Apr 2011 14:52:50 +0000 (15:52 +0100)]
xentrace: Move register_cpu_notifier() call into boot-time init.
We can't do it lazily from alloc_trace_bufs() as that gets called
later if tracing is enabled later by dom0.
Signed-off-by: Keir Fraser <keir@xen.org>
George Dunlap [Wed, 6 Apr 2011 10:40:54 +0000 (11:40 +0100)]
x86/hvm: load CPU structures from xen versions <=3.4
Xen 4.0 added "msr_tsc_aux" in the middle of the hvm_hw_cpu structure, making
it incompatible with pre-3.4 savefiles. This patch uses the recently introduced
backwards-compatibility infrastructure to convert the old to the new.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
George Dunlap [Wed, 6 Apr 2011 10:40:51 +0000 (11:40 +0100)]
hvm: infrastructure for backwards-compatible loading
The hvm_save code is used to save and restore hypervisor-related
hvm state, either for classic save/restore, or for migration
(including remus). This is meant to be backwards-compatible across
some hypervisor versions; but if it does change, there is no way to
handle the old format as well as the new.
This patch introduces the infrastructure to allow a single older
version ("compat") of any given "save type" to be defined, along with
a function to turn the "old" version into the "new" version. If the
size check fails for the "normal" version, it will check the "compat"
version, and if it matches, will read the old entry and call the
conversion function.
This patch involves some preprocessor hackery, but I'm only extending the
hackery that's already there.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
Tim Deegan [Wed, 6 Apr 2011 10:22:39 +0000 (11:22 +0100)]
Nested SVM: fix race in remote shootdown.
nestedhvm_flushtlb_ipi() can run between nsvm_vcpu_switch() and CLGI,
which would leave the VMCB pointing at the wrong p2m table.
Check for this after CLGI.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Tim Deegan [Wed, 6 Apr 2011 10:22:39 +0000 (11:22 +0100)]
xen: fix non-debug and 32-bit builds after nested HVM series
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
cegger [Tue, 5 Apr 2011 13:44:09 +0000 (15:44 +0200)]
Implement Nested-on-Nested.
This allows the guest to run nested guest with hap enabled.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
cegger [Wed, 9 Mar 2011 11:36:23 +0000 (12:36 +0100)]
Implement generic piece to finally enable nested virtualization
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
cegger [Wed, 9 Mar 2011 11:36:17 +0000 (12:36 +0100)]
Implement SVM specific interrupt handling
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
cegger [Wed, 9 Mar 2011 11:36:05 +0000 (12:36 +0100)]
Implement SVM specific part for Nested Virtualization
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
cegger [Mon, 28 Feb 2011 11:21:57 +0000 (12:21 +0100)]
Handle interrupts (generic part)
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
cegger [Mon, 28 Feb 2011 11:21:54 +0000 (12:21 +0100)]
Allow guest to enable SVM in EFER
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
cegger [Mon, 28 Feb 2011 11:21:52 +0000 (12:21 +0100)]
When injecting an exception into L2 guest,
inject a #VMEXIT if L1 guest intercepts the exception
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
cegger [Mon, 28 Feb 2011 11:21:49 +0000 (12:21 +0100)]
Allow paged real mode during vmrun emulation.
Emulate cr0 and cr4 when guest does not intercept them.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
cegger [Mon, 28 Feb 2011 11:21:46 +0000 (12:21 +0100)]
Nested Virtualization core implementation
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
cegger [Mon, 28 Feb 2011 11:21:44 +0000 (12:21 +0100)]
add nestedhvm function hooks for svm/vmx specific code
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
cegger [Mon, 28 Feb 2011 11:21:41 +0000 (12:21 +0100)]
Data structures for Nested Virtualization
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
cegger [Mon, 28 Feb 2011 11:21:38 +0000 (12:21 +0100)]
tools: Add nestedhvm guest config option
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Eddie Dong <eddie.dong@intel.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Committed-by: Tim Deegan <Tim.Deegan@citrix.com>
Keir Fraser [Wed, 6 Apr 2011 08:20:40 +0000 (09:20 +0100)]
Remove __init specifier from function declarations in header files.
The specifier only needs to be added to the function's definition.
At the same time, fix init_cpu_to_node() to be __init rather than
__devinit (it is only called at boot time).
Signed-off-by: Keir Fraser <keir@xen.org>
Allen Kay [Wed, 6 Apr 2011 08:11:02 +0000 (09:11 +0100)]
[VTD] Fixes to ACPI DMAR flag checks.
* platform_supports_{intremap,x2apic} should not be marked __init as
they are used during S3 resume.
* DMAR flags should be taken from the table passed to
acpi_parse_dmar() -- this is the trusted copy of the DMAR, when
running in TXT mode.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>